A Divergence Critic
نویسنده
چکیده
Inductive theorem provers often diverge. This paper describes a critic which monitors the construction of inductive proofs attempting to identify diverging proof attempts. The critic proposes lemmas and generalizations which hopefully allow the proof to go through without divergence. The critic enables the system SPIKE to prove many theorems completely automatically from the deenitions alone.
منابع مشابه
A Divergence Critic for Inductive Proof
Inductive theorem provers often diverge. This paper describes a simple critic, a computer program which monitors the construction of inductive proofs attempting to identify diverging proof attempts. Divergence is recognized by means of a \diierence matching" procedure. The critic then proposes lemmas and generalizations which \ripple" these differences away so that the proof can go through with...
متن کاملA Divergence Critic
Inductive theorem provers often diverge. This paper describes a simple critic, a computer program which monitors the construction of inductive proofs attempting to identify diverging proof attempts. Divergence is recognized by means of a \di erence matching" procedure. The critic then proposes lemmas and generalizations which \ripple" these differences away so that the proof can go through with...
متن کاملFully probabilistic control design in an adaptive critic framework
Optimal stochastic controller pushes the closed-loop behavior as close as possible to the desired one. The fully probabilistic design (FPD) uses probabilistic description of the desired closed loop and minimizes Kullback-Leibler divergence of the closed-loop description to the desired one. Practical exploitation of the fully probabilistic design control theory continues to be hindered by the co...
متن کاملReinforcement Learning by Value Gradients
The concept of the value-gradient is introduced and developed in the context of reinforcement learning, for deterministic episodic control problems that use a function approximator and have a continuous state space. It is shown that by learning the valuegradients, instead of just the values themselves, exploration or stochastic behaviour is no longer needed to find locally optimal trajectories....
متن کاملf-Divergence constrained policy improvement
To ensure stability of learning, state-of-the-art generalized policy iteration algorithms augment the policy improvement step with a trust region constraint bounding the information loss. The size of the trust region is commonly determined by the Kullback-Leibler (KL) divergence, which not only captures the notion of distance well but also yields closed-form solutions. In this paper, we conside...
متن کامل